HDF5 Overview

Tip: For information on the current HDF5 version, enter the following at the IDL prompt:

HELP, 'hdf5', /DLM

The Hierarchical Data Format (HDF) version 5 file format was designed for scientific data consisting of a hierarchy of datasets and attributes (or metadata). HDF is a product of the National Center for Supercomputing Applications (NCSA), which supplies the underlying C-language library; IDL provides access to this library via a set of procedures and functions contained in a dynamically loadable module (DLM).

IDL’s HDF5 routines all begin with the prefix "H5_" or "H5*_".

Programming Model

Hierarchical Data Format files are organized in a hierarchical structure. The two primary structures are:

HDF attributes are small named datasets that are attached to primary datasets, groups, or named datatypes.

Code Examples

Reading an Image

The following example opens up the hdf5_test.h5 file and reads in a sample image. It is assumed that the user already knows the dataset name, either from using h5dump, or the H5G_GET_MEMBER_NAME function.

PRO ex_read_hdf5

; Open the HDF5 file.

file = FILEPATH('hdf5_test.h5', $

SUBDIRECTORY=['examples', 'data'])

file_id = H5F_OPEN(file)

; Open the image dataset within the file.

; This is located within the /images group.

; We could also have used H5G_OPEN to open up the group first.

dataset_id1 = H5D_OPEN(file_id, '/images/Eskimo')

 

; Read in the actual image data.

image = H5D_READ(dataset_id1)

 

; Open up the dataspace associated with the Eskimo image.

dataspace_id = H5D_GET_SPACE(dataset_id1)

 

; Retrieve the dimensions so we can set the window size.

dimensions = H5S_GET_SIMPLE_EXTENT_DIMS(dataspace_id)

; Now open and read the color palette associated with

; this image.

dataset_id2 = H5D_OPEN(file_id, '/images/Eskimo_palette')

palette = H5D_READ(dataset_id2)

; Close all our identifiers so we don't leak resources.

H5S_CLOSE, dataspace_id

H5D_CLOSE, dataset_id1

H5D_CLOSE, dataset_id2

H5F_CLOSE, file_id

; Display the data.

DEVICE, DECOMPOSED=0

WINDOW, XSIZE=dimensions[0], YSIZE=dimensions[1]

TVLCT, palette[0,*], palette[1,*], palette[2,*]

 

; Use /ORDER since the image is stored top-to-bottom.

TV, image, /ORDER

END

Reading a Subselection

The following example reads only a portion of the previous image, using the dataspace keywords to H5D_READ.

PRO ex_read_hdf5_select

; Open the HDF5 file.

file = FILEPATH('hdf5_test.h5', $

SUBDIRECTORY=['examples', 'data'])

file_id = H5F_OPEN(file)

; Open the image dataset within the file.

dataset_id1 = H5D_OPEN(file_id, '/images/Eskimo')

 

; Open up the dataspace associated with the Eskimo image.

dataspace_id = H5D_GET_SPACE(dataset_id1)

 

; Now choose our hyperslab. We will pick out only the central

; portion of the image.

start = [100, 100]

count = [200, 200]

; Be sure to use /RESET to turn off all other

; selected elements.

H5S_SELECT_HYPERSLAB, dataspace_id, start, count, $

STRIDE=[2, 2], /RESET

; Create a simple dataspace to hold the result. If we

; didn't supply

; the memory dataspace, then the result would be the same size

; as the image dataspace, with zeroes everywhere except our

; hyperslab selection.

memory_space_id = H5S_CREATE_SIMPLE(count)

 

; Read in the actual image data.

image = H5D_READ(dataset_id1, FILE_SPACE=dataspace_id, $

MEMORY_SPACE=memory_space_id)

; Now open and read the color palette associated with

; this image.

dataset_id2 = H5D_OPEN(file_id, '/images/Eskimo_palette')

palette = H5D_READ(dataset_id2)

; Close all our identifiers so we don't leak resources.

H5S_CLOSE, memory_space_id

H5S_CLOSE, dataspace_id

H5D_CLOSE, dataset_id1

H5D_CLOSE, dataset_id2

H5F_CLOSE, file_id

; Display the data.

DEVICE, DECOMPOSED=0

WINDOW, XSIZE=count[0], YSIZE=count[1]

TVLCT, palette[0,*], palette[1,*], palette[2,*]

 

; We need to use /ORDER since the image is stored

; top-to-bottom.

TV, image, /ORDER

END

Creating a Data File

The following example creates a simple HDF5 data file with a single sample data set. The file is created in the current working directory.

PRO ex_create_hdf5

file = filepath('hdf5_out.h5')

fid = H5F_CREATE(file)

;; create data

data = hanning(100,150)

;; get data type and space, needed to create the dataset

datatype_id = H5T_IDL_CREATE(data)

dataspace_id = H5S_CREATE_SIMPLE(size(data,/DIMENSIONS))

;; create dataset in the output file

dataset_id = H5D_CREATE(fid,$

'Sample data',datatype_id,dataspace_id)

;; write data to dataset

H5D_WRITE,dataset_id,data

;; close all open identifiers

H5D_CLOSE,dataset_id

H5S_CLOSE,dataspace_id

H5T_CLOSE,datatype_id

H5F_CLOSE,fid

END

Reading Partial Datasets

To read a portion of a compound dataset or attribute, create a datatype that matches only the elements you wish to retrieve, and specify that datatype as the second argument to the H5D_READ function. The following example creates a simple HDF5 data file in the current directory, then opens the file and reads a portion of the data.

; Create sample data in an array of structures with two fields

struct = {time:0.0, data:intarr(40)}

r = REPLICATE(struct,20)

r.time = RANDOMU(seed,20)*1000

r.data = INDGEN(40,20)

; Create a file

file = 'h5_test.h5'

fid = H5F_CREATE(file)

; Create a datatype based on a single element of the arrary

dt = H5T_IDL_CREATE(struct)

; Create a 20 element dataspace

ds = H5S_CREATE_SIMPLE(N_ELEMENTS(r))

; Create and write the dataset

d = H5D_CREATE(fid, 'dataset', dt, ds)

H5D_WRITE, d, r

; Close the file

H5F_CLOSE, fid

; Open the file for reading

fid = H5F_OPEN(file)

; Open the dataset

d = H5D_OPEN(fid, 'dataset')

; Define the data we want to read from the dataset

struct = {data:intarr(40)}

; Create datatype denoting the portion to be read

dt = H5T_IDL_CREATE(struct)

; Read only the data that matches our datatype. The

; returned value will be a 20 element structure with only

; one tag, 'DATA'. Each element of which will be a [40]

; element integer array

result = H5D_READ(d, dt)

H5F_CLOSE, fid

The IDL HDF5 Library

The IDL HDF5 library consists of an almost direct mapping between the HDF5 library functions and the IDL functions and procedures. The relationship between the IDL routines and the HDF5 library is described in the following subsections.

Routine Names

The IDL routine names are typically identical to the HDF5 function names, with the exception that an underscore is added between the prefix and the actual function. For example, the C function H5get_libversion() is implemented by the IDL function H5_GET_LIBVERSION.

The IDL HDF5 library contains the following function categories:

Prefix

Category

Purpose

H5

Library

General library tasks

H5A

Attribute

Manipulate attribute datasets

H5D

Dataset

Manipulate general datasets

H5F

File

Create, open, and close files

H5G

Group

Handle groups of other groups or datasets

H5I

Identifier

Query object identifiers

H5R

Reference

Reference identifiers

H5S

Dataspace

Handle dataspace dimensions and selection

H5T

Datatype

Handle dataset element information

Functions Versus Procedures

HDF5 functions that only return an error code are typically implemented as IDL procedures. An example is H5F_CLOSE, which takes a single file identifier number as the argument and closes the file. HDF5 functions that return values are implemented as IDL functions. An example is H5F_OPEN, which takes a filename as the argument and returns a file identifier number.

Error Handling

All HDF5 functions that return an error or status code are checked for failure. If an error occurs, the HDF5 error handling code is called to retrieve the internal HDF5 error message. This error message is printed to the output window, and program execution stops.

Dimension Order

HDF5 uses C row-major ordering instead of IDL column-major ordering. For row major, the first listed dimension varies slowest, while for column major the first listed dimension varies fastest. The IDL HDF5 library handles this difference by automatically reversing the dimensions for all functions that accept lists of dimensions.

For example, an HDF5 file may be known to contain a dataset with dimensions [5][10][50], either as declared in the C code, or from the output from the h5dump utility. When this dataset is read into IDL, the array will have the dimensions listed as [50, 10, 5], using the output from the IDL help function.

HDF5 Datatypes

In HDF5, a datatype is an object that describes the storage format of the individual data points of a data set. There are two categories of datatypes; atomic and compound datatypes:

Compound Datatypes

HDF5 compound datatypes can be compared to C structures, Fortran structures, or SQL records. Compound datatypes can be nested; there is no limitation to the complexity of a compound datatype. Each member of a compound datatype must have a descriptive name, which is the key used to uniquely identify the member within the compound datatype.

Use one of the H5T_COMPOUND_CREATE or H5T_IDL_CREATE routines to create compound datatypes. Use the following routines to work with compound datatypes:

Example

See H5F_CREATE for an extensive example using compound datatypes.

Opaque Datatypes

An opaque datatype contains a series of bytes. It always contains a single element, regardless of the length of the series of bytes it contains.

When an IDL variable is written to a dataset or attribute defined as an opaque datatype, it is written as a string of bytes with no demarcation. When data in a opaque datatype is read into an IDL variable, it is returned as byte array. Use the FIX routine to convert the returned byte array to the appropriate IDL data type.

Use the H5T_IDL_CREATE routine with the OPAQUE keyword to create opaque datatypes. To create an opaque array, use an opaque datatype with the H5T_ARRAY_CREATE routine. A single string tag can be assigned to an opaque datatype to provide auxiliary information about what is contained therein. Create tags using the H5T_SET_TAG routine; retrieve tags using the H5T_GET_TAG routine. HDF5 limits the length of the tag to 255 characters.

Example

The following example creates an opaque datatype and stores within it a 20-element integer array.

; Create a file to hold the data

file = 'h5_test.h5'

fid = H5F_CREATE(file)

; Create some data

data = INDGEN(20)

; Create an opaque datatype

dt = H5T_IDL_CREATE(data, /OPAQUE)

; Create a single element dataspace

ds = H5S_CREATE_SIMPLE(1)

; Create and write the dataset

d = H5D_CREATE(fid, 'dataset', dt, ds)

H5D_WRITE, d, data

; Close the file

H5F_CLOSE, fid

; Reopen file for reading

fid = H5F_OPEN(file)

; Read in the data

d = H5D_OPEN(fid, 'dataset')

result = H5D_READ(d)

; Close the file

H5F_CLOSE, fid

HELP, result

IDL prints:

RESULT BYTE = Array[40]

Note that the result is a 40-element byte array, since each integer requires two bytes.

Enumeration Datatypes

An enumeration datatype consists of a set of (Name, Value) pairs, where:

Note: Name/value pairs must be assigned to the datatype before it is used to create a dataset. The dataset stores the state of the datatype at the time the dataset is created; additional changes to the datatype will not be reflected in the dataset.

Create the enumeration datatype using the H5T_ENUM_CREATE function. Once you have created an enumeration datatype:

These routines replicate the facilities provided by the underlying HDF5 library, which deals only with single name/value pairs. To make it easier to read and write entire enumerated lists, IDL provides two helper routines at package the name/value pairs in arrays of IDL IDL_H5_ENUM structures, which have the following definition:

{IDL_H5_ENUM, NAME:'', VALUE:0}

The routines are:

The H5T_ENUM_VALUES_TO_NAMES function is a helper routine that lets you retrieve the names associated with an array of values in a single operation.

The following routines may also be useful when working with enumeration datatypes:

H5T_GET_MEMBER_INDEX, H5T_GET_MEMBER_NAME, H5T_GET_MEMBER_VALUE

Example

The following example creates an enumeration datatype and saves it to a file. The example then reopens the file and reads the data, printing the names.

; Create a file to hold the data

file = 'h5_test.h5'

fid = H5F_CREATE(file)

; Create arrays to serve as name/value pairs

names = ['dog', 'pony', 'turtle', 'emu', 'wildebeest']

values = INDGEN(5)+1

; Create the enumeration datatype

dt = H5T_ENUM_CREATE()

; Associate the name/value pairs with the datatype

H5T_ENUM_SET_DATA, dt, names, values

; Create a dataspace, then create and write the dataset

ds = H5S_CREATE_SIMPLE(N_ELEMENTS(values))

d = H5D_CREATE(fid, 'dataset', dt, ds)

H5D_WRITE, d, values

; Close the file

H5F_CLOSE, fid

; Reopen file for reading

fid = H5F_OPEN(file)

; Read in the data

d = H5D_OPEN(fid, 'dataset')

dt = H5D_GET_TYPE(d)

result = H5D_READ(d)

; Close the file

H5F_CLOSE, fid

; Print the value associated with the name "pony"

PRINT, H5T_ENUM_VALUEOF(dt, 'pony')

; Print all the name strings

PRINT, H5T_ENUM_VALUES_TO_NAMES(dt, result)

Variable Length Array Datatypes

HDF5 provides support for variable length arrays, but IDL itself does not. As a result, in order to store data in an HDF5 variable length array you must:

  1. Create a series of vectors of data in IDL, each with a potentially different length. All vectors must be of the same data type.
  2. Store a pointer to each data vector in the PDATA field of an IDL_H5_VLEN structure. The IDL_H5_VLEN structure is defined as follows:

    { IDL_H5_VLEN, pdata:PTR_NEW() }

  3. Create an array of IDL_H5_VLEN structures that will be stored as an HDF5 variable length array.
  4. The IDL_H5_VLEN structure is defined as follows:

    { IDL_H5_VLEN, pdata:PTR_NEW() }

  5. Create a base HDF5 datatype from one of the data vectors.
  6. Create an HDF5 variable length datatype from the base datatype.
  7. Create an HDF5 dataspace of the appropriate size.
  8. Create an HDF5 dataset.
  9. Write the array of IDL_H5_VLEN structures to the HDF5 dataset.

Note: IDL string arrays are a special case: see Variable Length String Arrays for details.

Creating a variable length array datatype is a two-step process. First, you must create a base datatype using the H5T_IDL_CREATE function; all data in the variable length array must be of this datatype. Second, you create a variable length array datatype using the base datatype as an input to the H5T_VLEN_CREATE function.

Note: No explicit size is provided to the H5T_VLEN_CREATE function; sizes are determined as needed by the data being written.

Example: Writing a Variable Length Array

; Create a file to hold the data

file = 'h5_test.h5'

fid = H5F_CREATE(file)

; Create three vectors containing integer data

a = INDGEN(2)

b = INDGEN(3)

c = 3

; Create an array of three IDL_H5_VLEN structures

sArray = REPLICATE({IDL_H5_VLEN},3)

; Populate the IDL_H5_VLEN structures with pointers to

; the three data vectors

sArray[0].pdata = PTR_NEW(a)

sArray[1].pdata = PTR_NEW(b)

sArray[2].pdata = PTR_NEW(c)

; Create a dataype based on one of the data vectors

dt1 = H5T_IDL_CREATE(a)

; Create a variable length datatype based on the previously-

; created datatype

dt = H5T_VLEN_CREATE(dt1)

; Create a dataspace

ds = H5S_CREATE_SIMPLE(N_ELEMENTS(sArray))

; Create the dataset

d = H5D_CREATE(fid,'dataset', dt, ds)

; Write the array of structures to the dataset

H5D_WRITE, d, sArray

Examples: Reading a Variable Length Array

Using the H5D_READ function to read data written as a variable length array creates an array of IDL_H5_VLEN structures. The following examples show how to refer to individual data elements of various HDF5 datatypes

Atomic HDF5 Datatypes

To read and access data stored in variable length arrays of atomic HDF5 datatypes, dereference the pointer stored in the PDATA field of the appropriate IDL_H5_VLEN structure. For example, to retrieve the variable b from the data written in the above example:

data = H5D_READ(d)

b = *data[1].pdata

Compound HDF5 Datatypes

If you have a variable length array of compound datatypes, the tag tag of the jth structure of the ith element of the variable length array would be accessed as follows:

data = H5D_READ(d)

a = (*data[i].pdata)[j].tag

Variable Length Arrays of Variable Length Arrays

If you have a variable length array of variable length arrays of integers, the kth integer of the jth element of a variable length array stored in the ith element of a variable length array would be accessed as follows:

data = H5D_READ(d)

a = (*(*data[i].pdata)[j].pdata)[k]

Compound Datatypes Containing Variable Length Arrays

If you have a compound datatype containing a variable length array, the kth data element of the jth variable length array in the ith compound datatype would be accessed as follows:

data = H5D_READ(d)

a = (*data[i].vl_array[j].pdata)[k]

Variable Length String Arrays

Because the data vectors referenced by the pointers stored in the PDATA field of the IDL_H5_VLEN structure must all have the same type and dimension, strings are handled as vectors of individual characters rather than as atomic units. This means that each element in a string array must be assigned to an individual IDL_H5_VLEN structure:

str = ['dog', 'dragon', 'duck']

sArray = REPLICATE({IDL_H5_VLEN},3)

sArray[0].pdata = ptr_new(str[0])

sArray[1].pdata = ptr_new(str[1])

sArray[2].pdata = ptr_new(str[2])

Use the H5T_STR_TO_VLEN function to assist in converting between an IDL string array and an HDF5 variable length string array. The following achieves the same result as the above five lines:

str = ['dog', 'dragon', 'duck']

sArray = H5T_STR_TO_VLEN(str)

Similarly, if you have an HDF5 variable length array containing string data, use the H5T_VLEN_TO_STR function to access the string data:

data = H5D_READ(d)

str = H5T_VLEN_TO_STR(data)